本文提出了一种表达语音合成架构,用于在单词级别建模和控制说话方式。它试图借助两个编码器来学习语音数据的单词级风格和韵律表示。通过查找声学特征的每个单词的样式令牌的组合,第二个模型样式,第二个输出单词级序列仅在语音信息上调节,以便从风格信息解开它。两个编码器输出与音素编码器输出对齐并连接,然后用非周度塔歇尔策略模型解码。额外的先前编码器用于自向预测样式标记,以便模型能够在没有参考话语的情况下运行。我们发现所产生的模型给出了对样式的单词级和全局控制,以及韵律转移能力。
translated by 谷歌翻译
本文介绍了一个端到端的文本到语音系统,CPU延迟低,适用于实时应用。该系统由基于自回归关注的序列到序列声学模型和用于波形生成的LPCNet声码器组成。提出了一种采用塔克罗伦1和2型号的模块的声学模型架构,而通过使用最近提出的基于位置的注意机制来确保稳定性,适用于任意句子长度。在推断期间,解码器是展开的,并且以流式方式执行声学特征生成,允许与句子长度无关的几乎恒定的延迟。实验结果表明,声学模型可以产生比计算机CPU上的实时大约31倍的功能序列,移动CPU上的6.5倍,使其能够满足两个设备上实时应用所需的条件。全端到端系统可以通过听证测试来验证几乎是自然的质量语音。
translated by 谷歌翻译
Machine learning (ML) algorithms are remarkably good at approximating complex non-linear relationships. Most ML training processes, however, are designed to deliver ML tools with good average performance, but do not offer any guarantees about their worst-case estimation error. For safety-critical systems such as power systems, this places a major barrier for their adoption. So far, approaches could determine the worst-case violations of only trained ML algorithms. To the best of our knowledge, this is the first paper to introduce a neural network training procedure designed to achieve both a good average performance and minimum worst-case violations. Using the Optimal Power Flow (OPF) problem as a guiding application, our approach (i) introduces a framework that reduces the worst-case generation constraint violations during training, incorporating them as a differentiable optimization layer; and (ii) presents a neural network sequential learning architecture to significantly accelerate it. We demonstrate the proposed architecture on four different test systems ranging from 39 buses to 162 buses, for both AC-OPF and DC-OPF applications.
translated by 谷歌翻译
Industry 4.0 aims to optimize the manufacturing environment by leveraging new technological advances, such as new sensing capabilities and artificial intelligence. The DRAEM technique has shown state-of-the-art performance for unsupervised classification. The ability to create anomaly maps highlighting areas where defects probably lie can be leveraged to provide cues to supervised classification models and enhance their performance. Our research shows that the best performance is achieved when training a defect detection model by providing an image and the corresponding anomaly map as input. Furthermore, such a setting provides consistent performance when framing the defect detection as a binary or multiclass classification problem and is not affected by class balancing policies. We performed the experiments on three datasets with real-world data provided by Philips Consumer Lifestyle BV.
translated by 谷歌翻译
Quality control is a crucial activity performed by manufacturing companies to ensure their products conform to the requirements and specifications. The introduction of artificial intelligence models enables to automate the visual quality inspection, speeding up the inspection process and ensuring all products are evaluated under the same criteria. In this research, we compare supervised and unsupervised defect detection techniques and explore data augmentation techniques to mitigate the data imbalance in the context of automated visual inspection. Furthermore, we use Generative Adversarial Networks for data augmentation to enhance the classifiers' discriminative performance. Our results show that state-of-the-art unsupervised defect detection does not match the performance of supervised models but can be used to reduce the labeling workload by more than 50%. Furthermore, the best classification performance was achieved considering GAN-based data generation with AUC ROC scores equal to or higher than 0,9898, even when increasing the dataset imbalance by leaving only 25\% of the images denoting defective products. We performed the research with real-world data provided by Philips Consumer Lifestyle BV.
translated by 谷歌翻译
The cyber-physical convergence is opening up new business opportunities for industrial operators. The need for deep integration of the cyber and the physical worlds establishes a rich business agenda towards consolidating new system and network engineering approaches. This revolution would not be possible without the rich and heterogeneous sources of data, as well as the ability of their intelligent exploitation, mainly due to the fact that data will serve as a fundamental resource to promote Industry 4.0. One of the most fruitful research and practice areas emerging from this data-rich, cyber-physical, smart factory environment is the data-driven process monitoring field, which applies machine learning methodologies to enable predictive maintenance applications. In this paper, we examine popular time series forecasting techniques as well as supervised machine learning algorithms in the applied context of Industry 4.0, by transforming and preprocessing the historical industrial dataset of a packing machine's operational state recordings (real data coming from the production line of a manufacturing plant from the food and beverage domain). In our methodology, we use only a single signal concerning the machine's operational status to make our predictions, without considering other operational variables or fault and warning signals, hence its characterization as ``agnostic''. In this respect, the results demonstrate that the adopted methods achieve a quite promising performance on three targeted use cases.
translated by 谷歌翻译
Climate change is expected to aggravate wildfire activity through the exacerbation of fire weather. Improving our capabilities to anticipate wildfires on a global scale is of uttermost importance for mitigating their negative effects. In this work, we create a global fire dataset and demonstrate a prototype for predicting the presence of global burned areas on a sub-seasonal scale with the use of segmentation deep learning models. Particularly, we present an open-access global analysis-ready datacube, which contains a variety of variables related to the seasonal and sub-seasonal fire drivers (climate, vegetation, oceanic indices, human-related variables), as well as the historical burned areas and wildfire emissions for 2001-2021. We train a deep learning model, which treats global wildfire forecasting as an image segmentation task and skillfully predicts the presence of burned areas 8, 16, 32 and 64 days ahead of time. Our work motivates the use of deep learning for global burned area forecasting and paves the way towards improved anticipation of global wildfire patterns.
translated by 谷歌翻译
在排放限制下优化的气体网络规划优化优先考虑最少$ _2 $强度的天然气供应。由于此问题包括复杂的气流物理定律,因此标准优化求解器无法保证融合与可行解决方案。为了解决这个问题,我们开发了一个输入 - 控制神经网络(ICNN)辅助优化例程,该程序结合了一组训练有素的ICNN,以高精度近似于气流方程。比利时气体网络上的数值测试表明,ICNN辅助优化主导了非凸和基于弛豫的求解器,其最佳增长较大,与更严格的发射目标有关。此外,每当非凸线求解器失败时,ICNN ADED优化为网络计划提供了可行的解决方案。
translated by 谷歌翻译
新兴的非挥发记忆设备的备忘录在神经形态硬件设计中显示出有希望的潜力,尤其是在尖峰神经网络(SNN)硬件实现方面。基于Memristor的SNN已成功应用于各种应用程序,包括图像分类和模式识别。但是,在文本分类中实施基于备忘录的SNN仍在探索中。主要原因之一是,培训基于备忘录的SNN用于文本分类是由于缺乏有效的学习规则和不理想性的不存在。为了解决这些问题,并加快了在文本分类应用程序中探索基于备忘录的尖峰神经网络的研究,我们使用经验的Memristor模型开发了使用虚拟备忘录阵列的仿真框架。我们使用此框架来演示IMDB电影评论数据集中的情感分析任务。我们采用两种方法,通过将预训练的人工神经网络(ANN)转换为基于Memristor的SNN或2),通过直接训练基于Memristor的SNN,以获取训练有素的尖峰神经网络:1)通过将预训练的人工神经网络(ANN)转换为基于Memristor的SNN。这两种方法可以在两种情况下应用:离线分类和在线培训。鉴于等效ANN的基线训练精度为86.02%,我们通过将预训练的ANN转换为基于Memristor的SNN的ANN通过将预培训的ANN转换为基于Memristor的SNN的85.88%的分类准确性为85.88%。我们得出的结论是,可以在从ANN到SNN以及从非同步突触到数据驱动的Memristive突触的模拟中实现类似的分类精度。我们还研究了诸如Spike火车长度,读取噪声和重量更新停止条件之类的全局参数如何影响两种方法的神经网络。
translated by 谷歌翻译
接受注释较弱的对象探测器是全面监督者的负担得起的替代方案。但是,它们之间仍然存在显着的性能差距。我们建议通过微调预先训练的弱监督检测器来缩小这一差距,并使用``Box-In-box''(bib'(bib)自动从训练集中自动选择了一些完全注销的样品,这是一种新颖的活跃学习专门针对弱势监督探测器的据可查的失败模式而设计的策略。 VOC07和可可基准的实验表明,围嘴表现优于其他活跃的学习技术,并显着改善了基本的弱监督探测器的性能,而每个类别仅几个完全宣布的图像。围嘴达到了完全监督的快速RCNN的97%,在VOC07上仅10%的全已通量图像。在可可(COCO)上,平均每类使用10张全面通量的图像,或同等的训练集的1%,还减少了弱监督检测器和完全监督的快速RCN之间的性能差距(In AP)以上超过70% ,在性能和数据效率之间表现出良好的权衡。我们的代码可在https://github.com/huyvvo/bib上公开获取。
translated by 谷歌翻译